-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
position_deletes metadata table #1615
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally LGTM! i added a few nit comments on style.
I think its a good idea to introduce the PositionalDelete class, similar to DataFile
pyiceberg/table/inspect.py
Outdated
@@ -657,3 +717,19 @@ def all_manifests(self) -> "pa.Table": | |||
lambda args: self._generate_manifests_table(*args), [(snapshot, True) for snapshot in snapshots] | |||
) | |||
return pa.concat_tables(manifests_by_snapshots) | |||
|
|||
def position_deletes(self) -> "pa.Table": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for any metadata tables referencing the current snapshot, let's add an optional param to allow querying for any given snapshot id.
pyiceberg/table/inspect.py
Outdated
|
||
from pyiceberg.io.pyarrow import schema_to_pyarrow | ||
|
||
partition_record = self.tbl.metadata.specs_struct() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: specs_struct()
looks at all PartitionSpecs, do we just need the current partition spec here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also nit: naming, this is a structtype, not the underlying record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch , i now take the current PartitionSpec
pyiceberg/table/inspect.py
Outdated
partition_record = self.tbl.metadata.specs_struct() | ||
pa_partition_struct = schema_to_pyarrow(partition_record) | ||
pa_row_struct = schema_to_pyarrow(self.tbl.schema().as_struct()) | ||
positinal_delete_schema = pa.schema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
positinal_delete_schema = pa.schema( | |
positional_delete_schema = pa.schema( |
pyiceberg/table/inspect.py
Outdated
pa.field("delete_file_path", pa.string(), nullable=False), | ||
] | ||
) | ||
return positinal_delete_schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return positinal_delete_schema | |
return positional_delete_schema |
pyiceberg/io/pyarrow.py
Outdated
def _read_delete_file(fs: FileSystem, data_file: DataFile, schema: "pa.Schema") -> pa.Table: | ||
delete_fragment = _construct_fragment(fs, data_file, file_format_kwargs={"pre_buffer": True, "buffer_size": ONE_MEGABYTE}) | ||
table = ds.Scanner.from_fragment(fragment=delete_fragment, schema=schema).to_table() | ||
return table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this might be a good place to start introducing a PositionalDelete
class and we can get rid of the custom _get_positional_file_schema
function
also lets add this to new table to the docs as well https://py.iceberg.apache.org/api/#inspecting-tables |
done :) |
Implements position_deletes metadata table - #1053